Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQL Aggregation functions shouldn't build up memory for each row #678

Merged
merged 3 commits into from
Jan 12, 2017

Conversation

tiko-tiko
Copy link

TL; DR: I'd like to discuss:

  • style guide
  • missing comments
  • additional tests to conform to
  • implementation/buffer size of REDUCED
    Feel free to comment anything you like

Some context:
I have been experimenting with a few SPARQL queries which should do a lot of counting.
The idea was to find correlations among different predicates in order to find the most promising queries on a data set. One query yielded a few million combinations which caused rdflib even without the DISTINCT keyword to build up enough memory consumption to start swapping.

The cause seemed to be evalGroup() which appended each incoming row to a list. I thought of two different approaches to avoid this, an OO style approach and using python's generators with their send() method. Both seemed to have similar memory requirements so I decided for the OO style approach for better readability.

I tried to get some response on the IRC channel to ask for style guides and such but during the festive season there was little activity. So I used the style guide I am used to. Please point out what to change to conform to your style guide.

After the refactoring I fixed the code to pass all the tests again. Are there additional requirements which are not implemented via the test suite?

Also I tried to implement a generic algorithm for the REDUCED keyword although it is not directly related to my original problem.

@coveralls
Copy link

coveralls commented Jan 5, 2017

Coverage Status

Coverage increased (+0.07%) to 62.909% when pulling 636f685 on tiko-tiko:sparql-aggregates into 2a869ca on RDFLib:master.

@gromgull
Copy link
Member

Hi @tiko-tiko !

This looks very interesting! And well done on making sense of the SPARQL Engine code! Although I wrote a lot of it in the first place, whenever I look at it now I am often confused.

  • For style we're not very strict, mainly pep8
  • If the tests pass, this is good enough!
  • What comments are you missing? I mean, there are MANY comments missing - but anything in particular?

@tiko-tiko
Copy link
Author

Hi @gromgull !

thank you for your reply. Regarding the comments for example I know each class, method and function could have a docstring, On the other hand given the OO nature of my code this could be a little repetitive. I am looking for parts of my patch which need more explanation but my view on this matter is rather biased at the moment. I understand that eventually I will wish I would have made more comments.
Could please tell me which parts seem unclear to you, so I can clarify?

Thanks, tiko-tiko

@gromgull
Copy link
Member

Then I misunderstood - I though you asked about more comments in existing code. Your code is fine, I think comments are just waiting to be made wrong and outdated by changes to code anyway :)

@gromgull gromgull merged commit 941b739 into RDFLib:master Jan 12, 2017
@tiko-tiko tiko-tiko deleted the sparql-aggregates branch January 14, 2017 11:30
@joernhees joernhees added this to the rdflib 4.2.2 milestone Jan 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants